{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline\n",
    "import os\n",
    "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n",
    "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\" "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building an End-to-End Question-Answering System With BERT\n",
    "\n",
    "In this notebook, we build a practical, end-to-end Question-Answering (QA) system with BERT in rougly 3 lines of code.  We will treat a corpus of text documents as a knowledge base to which we can ask questions and retrieve exact answers using [BERT](https://arxiv.org/abs/1810.04805). This goes beyond simplistic keyword searches.\n",
    "\n",
    "For this example, we will use the [20 Newsgroup dataset](http://qwone.com/~jason/20Newsgroups/) as the text corpus.  As a collection of newsgroup postings which contains an abundance of opinions and debates, the corpus is not ideal as a knowledgebase.  It is better to use fact-based documents such as Wikipedia articles or even news articles.  However, this dataset will suffice for this example.\n",
    "\n",
    "Let us begin by loading the dataset into an array using **scikit-learn** and importing *ktrain* modules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load 20newsgroups datset into an array\n",
    "from sklearn.datasets import fetch_20newsgroups\n",
    "remove = ('headers', 'footers', 'quotes')\n",
    "newsgroups_train = fetch_20newsgroups(subset='train', remove=remove)\n",
    "newsgroups_test = fetch_20newsgroups(subset='test', remove=remove)\n",
    "docs = newsgroups_train.data +  newsgroups_test.data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ktrain\n",
    "from ktrain import text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 1:  Index the Documents\n",
    "\n",
    "We will first index the documents into a search engine that will be used to quickly retrieve documents that are likely to contain answers to a question. To do so, we must choose an index location, which must be a folder that does not already exist. \n",
    "\n",
    "Since the newsgroup postings are small and fit in memory, we wil set `commit_every` to a large value to speed up the indexing process. This means results will not be written until the end.  If you experience issues, you can lower this value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "INDEXDIR = '/tmp/myindex'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "text.SimpleQA.initialize_index(INDEXDIR)\n",
    "text.SimpleQA.index_from_list(docs, INDEXDIR, commit_every=len(docs))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For documents sets that are too large to be loaded into a Python list, you can use `SimpleQA.index_from_folder`, which will crawl a folder and index all plain text documents.\n",
    "\n",
    "The above steps need to only be performed once. Once an index is already created, you can skip this step and proceed directly to **STEP 2** to begin using your system."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 2: Create a QA instance\n",
    "\n",
    "Next, we create a QA instance.  This step will automatically download the BERT SQUAD model if it does not already exist on your system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "qa = text.SimpleQA(INDEXDIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's it!  In roughly **3 lines of code**, we have built an end-to-end QA system that can now be used to generate answers to questions.  Let's ask our system some questions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 3:  Ask Questions\n",
    "\n",
    "We will invoke the `ask` method to issue questions to the text corpus we indexed and retrieve answers.  We will also use the `qa.display` method to nicely display the top 5 results in this Jupyter notebook. The answers are inferred using a BERT model trained on the SQUAD dataset.  Since the model is combing through paragraphs and sentences to find an answer, it may take a minute or two to return results.\n",
    "\n",
    "Note also that the 20 Newsgroup Dataset covers events in the early to mid 1990s, so references to recent events will not exist.\n",
    "\n",
    "#### Space Question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Candidate Answer</th>\n",
       "      <th>Context</th>\n",
       "      <th>Confidence</th>\n",
       "      <th>Document Reference</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>in october of 1997</td>\n",
       "      <td><div> cassini is scheduled for launch aboard a titan iv / centaur  <font color='red'>in october of 1997</font> .</div></td>\n",
       "      <td>0.348673</td>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>on january 26,1962</td>\n",
       "      <td><div>ranger 3, launched  <font color='red'>on january 26,1962</font> , was intended to land an instrument capsule on the surface of the moon, but problems during the launch caused the probe to miss the moon and head into solar orbit.</div></td>\n",
       "      <td>0.195162</td>\n",
       "      <td>8525</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>on november 5,1964</td>\n",
       "      <td><div>mariner 3, launched  <font color='red'>on november 5,1964</font> , was lost when its protective shroud failed to eject as the craft was placed into interplanetary space.</div></td>\n",
       "      <td>0.162835</td>\n",
       "      <td>8525</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>launched october 18,1962</td>\n",
       "      <td><div>ranger 5,  <font color='red'>launched october 18,1962</font>  and similar to ranger 3 and 4, lost all solar panel and battery power enroute and eventually missed the moon and drifted off into solar orbit.</div></td>\n",
       "      <td>0.077810</td>\n",
       "      <td>8525</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2001</td>\n",
       "      <td><div> possible launch dates : 1996 for imaging orbiter,  <font color='red'>2001</font>  for rover.</div></td>\n",
       "      <td>0.069741</td>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "answers = qa.ask('When did the Cassini probe launch?')\n",
    "qa.display_answers(answers[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the top candidate answer indicates that the Cassini space probe was launched in October of 1997, which appears to be correct.  The correct answer will not always be the top answer, but it is in this case.  \n",
    "\n",
    "Note that, since we used `index_from_list` to index documents, the last column shows the list index associated with the newsgroup posting containing the answer, which can be used to peruse the entire document containing the answer.  If using `index_from_folder` to index documents, the last column will show the relative path and filename of the document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Archive-name: space/new_probes\n",
      "Last-modified: $Date: 93/04/01 14:39:17 $\n",
      "\n",
      "UPCOMING PLANETARY PROBES - MISSIONS AND SCHEDULES\n",
      "\n",
      "    Information on upcoming or currently active missions not mentioned below\n",
      "    would be welcome. Sources: NASA fact sheets, Cassini Mission Design\n",
      "    team, ISAS/NASDA launch schedules, press kits.\n",
      "\n",
      "\n",
      "    ASUKA (ASTRO-D) - ISAS (Japan) X-ray astronomy satellite, launched into\n",
      "    Earth orbit on 2/20/93. Equipped with large-area wide-wavelength (1-20\n",
      "    Angstrom) X-ray telescope, X-ray CCD cameras, and imaging gas\n",
      "    scintillation proportional counters.\n",
      "\n",
      "\n",
      "    CASSINI - Saturn orbiter and Titan atmosphere probe. Cassini is a joint\n",
      "    NASA/ESA project designed to accomplish an exploration of the Saturnian\n",
      "    system with its Cassini Saturn Orbiter and Huygens Titan Probe. Cassini\n",
      "    is scheduled for launch aboard a Titan IV/Centaur in October of 1997.\n",
      "    After gravity assists of Venus, Earth and Jupiter in a VVEJGA\n",
      "    trajectory, the spacecraft will arrive at Saturn in June of 2004. Upon\n",
      "    arrival, the Cassini spacecraft performs several maneuvers to achieve an\n",
      "    orbit around Saturn. Near the end of this initial orbit, the Huygens\n",
      "    Probe separates from the Orbiter and descends through the atmosphere of\n",
      "    Titan. The Orbiter relays the Probe data to Earth for about 3 hours\n",
      "    while the Probe enters and traverses the cloudy atmosphere to the\n",
      "    surface. After the completion of the Probe mission, the Orbiter\n",
      "    continues touring the Saturnian system for three and a half years. Titan\n",
      "    synchronous orbit trajectories will allow about 35 flybys of Titan and\n",
      "    targeted flybys of Iapetus, Dione and Enceladus. The objectives of the\n",
      "    mission are threefold: conduct detailed studies of Saturn's atmosphere,\n",
      "    rings and magnetosphere; conduct close-up studies of Saturn's\n",
      "    satellites, and characterize Titan's atmosphere and surface.\n",
      "\n",
      "    One of the most intriguing aspects of Titan is the possibility that its\n",
      "    surface may be covered in part with lakes of liquid hydrocarbons that\n",
      "    result from photochemical processes in its upper atmosphere. These\n",
      "    hydrocarbons condense to form a global smog layer and eventually rain\n",
      "    down onto the surface. The Cassini orbiter will use onboard radar to\n",
      "    peer through Titan's clouds and determine if there is liquid on the\n",
      "    surface. Experiments aboard both the orbiter and the entry probe will\n",
      "    investigate the chemical processes that produce this unique atmosphere.\n",
      "\n",
      "    The Cassini mission is named for Jean Dominique Cassini (1625-1712), the\n",
      "    first director of the Paris Observatory, who discovered several of\n",
      "    Saturn's satellites and the major division in its rings. The Titan\n",
      "    atmospheric entry probe is named for the Dutch physicist Christiaan\n",
      "    Huygens (1629-1695), who discovered Titan and first described the true\n",
      "    nature of Saturn's rings.\n",
      "\n",
      "\t Key Scheduled Dates for the Cassini Mission (VVEJGA Trajectory)\n",
      "\t -------------------------------------------------------------\n",
      "\t   10/06/97 - Titan IV/Centaur Launch\n",
      "\t   04/21/98 - Venus 1 Gravity Assist\n",
      "\t   06/20/99 - Venus 2 Gravity Assist\n",
      "\t   08/16/99 - Earth Gravity Assist\n",
      "\t   12/30/00 - Jupiter Gravity Assist\n",
      "\t   06/25/04 - Saturn Arrival\n",
      "\t   01/09/05 - Titan Probe Release\n",
      "\t   01/30/05 - Titan Probe Entry\n",
      "\t   06/25/08 - End of Primary Mission\n",
      "\t    (Schedule last updated 7/22/92)\n",
      "\n",
      "\n",
      "    GALILEO - Jupiter orbiter and atmosphere probe, in transit. Has returned\n",
      "    the first resolved images of an asteroid, Gaspra, while in transit to\n",
      "    Jupiter. Efforts to unfurl the stuck High-Gain Antenna (HGA) have\n",
      "    essentially been abandoned. JPL has developed a backup plan using data\n",
      "    compression (JPEG-like for images, lossless compression for data from\n",
      "    the other instruments) which should allow the mission to achieve\n",
      "    approximately 70% of its original objectives.\n",
      "\n",
      "\t   Galileo Schedule\n",
      "\t   ----------------\n",
      "\t   10/18/89 - Launch from Space Shuttle\n",
      "\t   02/09/90 - Venus Flyby\n",
      "\t   10/**/90 - Venus Data Playback\n",
      "\t   12/08/90 - 1st Earth Flyby\n",
      "\t   05/01/91 - High Gain Antenna Unfurled\n",
      "\t   07/91 - 06/92 - 1st Asteroid Belt Passage\n",
      "\t   10/29/91 - Asteroid Gaspra Flyby\n",
      "\t   12/08/92 - 2nd Earth Flyby\n",
      "\t   05/93 - 11/93 - 2nd Asteroid Belt Passage\n",
      "\t   08/28/93 - Asteroid Ida Flyby\n",
      "\t   07/02/95 - Probe Separation\n",
      "\t   07/09/95 - Orbiter Deflection Maneuver\n",
      "\t   12/95 - 10/97 - Orbital Tour of Jovian Moons\n",
      "\t   12/07/95 - Jupiter/Io Encounter\n",
      "\t   07/18/96 - Ganymede\n",
      "\t   09/28/96 - Ganymede\n",
      "\t   12/12/96 - Callisto\n",
      "\t   01/23/97 - Europa\n",
      "\t   02/28/97 - Ganymede\n",
      "\t   04/22/97 - Europa\n",
      "\t   05/31/97 - Europa\n",
      "\t   10/05/97 - Jupiter Magnetotail Exploration\n",
      "\n",
      "\n",
      "    HITEN - Japanese (ISAS) lunar probe launched 1/24/90. Has made\n",
      "    multiple lunar flybys. Released Hagoromo, a smaller satellite,\n",
      "    into lunar orbit. This mission made Japan the third nation to\n",
      "    orbit a satellite around the Moon.\n",
      "\n",
      "\n",
      "    MAGELLAN - Venus radar mapping mission. Has mapped almost the entire\n",
      "    surface at high resolution. Currently (4/93) collecting a global gravity\n",
      "    map.\n",
      "\n",
      "\n",
      "    MARS OBSERVER - Mars orbiter including 1.5 m/pixel resolution camera.\n",
      "    Launched 9/25/92 on a Titan III/TOS booster. MO is currently (4/93) in\n",
      "    transit to Mars, arriving on 8/24/93. Operations will start 11/93 for\n",
      "    one martian year (687 days).\n",
      "\n",
      "\n",
      "    TOPEX/Poseidon - Joint US/French Earth observing satellite, launched\n",
      "    8/10/92 on an Ariane 4 booster. The primary objective of the\n",
      "    TOPEX/POSEIDON project is to make precise and accurate global\n",
      "    observations of the sea level for several years, substantially\n",
      "    increasing understanding of global ocean dynamics. The satellite also\n",
      "    will increase understanding of how heat is transported in the ocean.\n",
      "\n",
      "\n",
      "    ULYSSES- European Space Agency probe to study the Sun from an orbit over\n",
      "    its poles. Launched in late 1990, it carries particles-and-fields\n",
      "    experiments (such as magnetometer, ion and electron collectors for\n",
      "    various energy ranges, plasma wave radio receivers, etc.) but no camera.\n",
      "\n",
      "    Since no human-built rocket is hefty enough to send Ulysses far out of\n",
      "    the ecliptic plane, it went to Jupiter instead, and stole energy from\n",
      "    that planet by sliding over Jupiter's north pole in a gravity-assist\n",
      "    manuver in February 1992. This bent its path into a solar orbit tilted\n",
      "    about 85 degrees to the ecliptic. It will pass over the Sun's south pole\n",
      "    in the summer of 1993. Its aphelion is 5.2 AU, and, surprisingly, its\n",
      "    perihelion is about 1.5 AU-- that's right, a solar-studies spacecraft\n",
      "    that's always further from the Sun than the Earth is!\n",
      "\n",
      "    While in Jupiter's neigborhood, Ulysses studied the magnetic and\n",
      "    radiation environment. For a short summary of these results, see\n",
      "    *Science*, V. 257, p. 1487-1489 (11 September 1992). For gory technical\n",
      "    detail, see the many articles in the same issue.\n",
      "\n",
      "\n",
      "    OTHER SPACE SCIENCE MISSIONS (note: this is based on a posting by Ron\n",
      "    Baalke in 11/89, with ISAS/NASDA information contributed by Yoshiro\n",
      "    Yamada (yamada@yscvax.ysc.go.jp). I'm attempting to track changes based\n",
      "    on updated shuttle manifests; corrections and updates are welcome.\n",
      "\n",
      "    1993 Missions\n",
      "\to ALEXIS [spring, Pegasus]\n",
      "\t    ALEXIS (Array of Low-Energy X-ray Imaging Sensors) is to perform\n",
      "\t    a wide-field sky survey in the \"soft\" (low-energy) X-ray\n",
      "\t    spectrum. It will scan the entire sky every six months to search\n",
      "\t    for variations in soft-X-ray emission from sources such as white\n",
      "\t    dwarfs, cataclysmic variable stars and flare stars. It will also\n",
      "\t    search nearby space for such exotic objects as isolated neutron\n",
      "\t    stars and gamma-ray bursters. ALEXIS is a project of Los Alamos\n",
      "\t    National Laboratory and is primarily a technology development\n",
      "\t    mission that uses astrophysical sources to demonstrate the\n",
      "\t    technology. Contact project investigator Jeffrey J Bloch\n",
      "\t    (jjb@beta.lanl.gov) for more information.\n",
      "\n",
      "\to Wind [Aug, Delta II rocket]\n",
      "\t    Satellite to measure solar wind input to magnetosphere.\n",
      "\n",
      "\to Space Radar Lab [Sep, STS-60 SRL-01]\n",
      "\t    Gather radar images of Earth's surface.\n",
      "\n",
      "\to Total Ozone Mapping Spectrometer [Dec, Pegasus rocket]\n",
      "\t    Study of Stratospheric ozone.\n",
      "\n",
      "\to SFU (Space Flyer Unit) [ISAS]\n",
      "\t    Conducting space experiments and observations and this can be\n",
      "\t    recovered after it conducts the various scientific and\n",
      "\t    engineering experiments. SFU is to be launched by ISAS and\n",
      "\t    retrieved by the U.S. Space Shuttle on STS-68 in 1994.\n",
      "\n",
      "    1994\n",
      "\to Polar Auroral Plasma Physics [May, Delta II rocket]\n",
      "\t    June, measure solar wind and ions and gases surrounding the\n",
      "\t    Earth.\n",
      "\n",
      "\to IML-2 (STS) [NASDA, Jul 1994 IML-02]\n",
      "\t    International Microgravity Laboratory.\n",
      "\n",
      "\to ADEOS [NASDA]\n",
      "\t    Advanced Earth Observing Satellite.\n",
      "\n",
      "\to MUSES-B (Mu Space Engineering Satellite-B) [ISAS]\n",
      "\t    Conducting research on the precise mechanism of space structure\n",
      "\t    and in-space astronomical observations of electromagnetic waves.\n",
      "\n",
      "    1995\n",
      "\tLUNAR-A [ISAS]\n",
      "\t    Elucidating the crust structure and thermal construction of the\n",
      "\t    moon's interior.\n",
      "\n",
      "\n",
      "    Proposed Missions:\n",
      "\to Advanced X-ray Astronomy Facility (AXAF)\n",
      "\t    Possible launch from shuttle in 1995, AXAF is a space\n",
      "\t    observatory with a high resolution telescope. It would orbit for\n",
      "\t    15 years and study the mysteries and fate of the universe.\n",
      "\n",
      "\to Earth Observing System (EOS)\n",
      "\t    Possible launch in 1997, 1 of 6 US orbiting space platforms to\n",
      "\t    provide long-term data (15 years) of Earth systems science\n",
      "\t    including planetary evolution.\n",
      "\n",
      "\to Mercury Observer\n",
      "\t    Possible 1997 launch.\n",
      "\n",
      "\to Lunar Observer\n",
      "\t    Possible 1997 launch, would be sent into a long-term lunar\n",
      "\t    orbit. The Observer, from 60 miles above the moon's poles, would\n",
      "\t    survey characteristics to provide a global context for the\n",
      "\t    results from the Apollo program.\n",
      "\n",
      "\to Space Infrared Telescope Facility\n",
      "\t    Possible launch by shuttle in 1999, this is the 4th element of\n",
      "\t    the Great Observatories program. A free-flying observatory with\n",
      "\t    a lifetime of 5 to 10 years, it would observe new comets and\n",
      "\t    other primitive bodies in the outer solar system, study cosmic\n",
      "\t    birth formation of galaxies, stars and planets and distant\n",
      "\t    infrared-emitting galaxies\n",
      "\n",
      "\to Mars Rover Sample Return (MRSR)\n",
      "\t    Robotics rover would return samples of Mars' atmosphere and\n",
      "\t    surface to Earch for analysis. Possible launch dates: 1996 for\n",
      "\t    imaging orbiter, 2001 for rover.\n",
      "\n",
      "\to Fire and Ice\n",
      "\t    Possible launch in 2001, will use a gravity assist flyby of\n",
      "\t    Earth in 2003, and use a final gravity assist from Jupiter in\n",
      "\t    2005, where the probe will split into its Fire and Ice\n",
      "\t    components: The Fire probe will journey into the Sun, taking\n",
      "\t    measurements of our star's upper atmosphere until it is\n",
      "\t    vaporized by the intense heat. The Ice probe will head out\n",
      "\t    towards Pluto, reaching the tiny world for study by 2016.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(docs[59])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The 20 Newsgroup dataset contains lots of posts discussing and debating Christianity, as well.  Let's ask a question on this subject.\n",
    "\n",
    "#### Religious Question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Candidate Answer</th>\n",
       "      <th>Context</th>\n",
       "      <th>Confidence</th>\n",
       "      <th>Document Reference</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>is god incarnate</td>\n",
       "      <td><div>jesus isn ' t god ? when jesus returns some people may miss him ? what version of the bible do you read mike ? jesus  <font color='red'>is god incarnate</font>  (in flesh).</div></td>\n",
       "      <td>0.569719</td>\n",
       "      <td>6356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>the incarnation of the son</td>\n",
       "      <td><div> jesus is  <font color='red'>the incarnation of the son</font> .</div></td>\n",
       "      <td>0.328918</td>\n",
       "      <td>11661</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>is god ' s son</td>\n",
       "      <td><div>) you seem to be suggesting the jesus  <font color='red'>is god ' s son</font>  in a physical sense, with the holy spirit as father and mary as mother.</div></td>\n",
       "      <td>0.069266</td>\n",
       "      <td>11661</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>was god ' s only begotten son</td>\n",
       "      <td><div> the fact that jesus  <font color='red'>was god ' s only begotten son</font>  does not seem to me to have much meaning since god can beget as many sons as he wants to.</div></td>\n",
       "      <td>0.016456</td>\n",
       "      <td>11661</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>jesus god only of the jews</td>\n",
       "      <td><div>which is more important : 1) the recorded word of jesus or 2) indications that you can deduce from the bible ? was  <font color='red'>jesus god only of the jews</font> , or god of all humankind of all race and sex ?</div></td>\n",
       "      <td>0.005702</td>\n",
       "      <td>7842</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "answers = qa.ask('Who was Jesus Christ?')\n",
    "qa.display_answers(answers[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we see different views on who Jesus was as debated and discussed in this document set.\n",
    "\n",
    "Finally, the 20 Newsgroup dataset also contains many groups about computing hardward and software.  Let's ask a technical support question.\n",
    "\n",
    "#### Technical Question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Candidate Answer</th>\n",
       "      <th>Context</th>\n",
       "      <th>Confidence</th>\n",
       "      <th>Document Reference</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>that not all display programs do gamma correction</td>\n",
       "      <td><div>the problem is  <font color='red'>that not all display programs do gamma correction</font> .</div></td>\n",
       "      <td>0.848456</td>\n",
       "      <td>13873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>if your viewer does not do gamma correction</td>\n",
       "      <td><div> <font color='red'>if your viewer does not do gamma correction</font> , then linear images will look too dark, and gamma corrected images will ok.</div></td>\n",
       "      <td>0.042678</td>\n",
       "      <td>13873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>altering the intensity in the hsv controls</td>\n",
       "      <td><div>  <font color='red'>altering the intensity in the hsv controls</font>  does not do the right thing, as it fails to take account of the effect gamma has on h and s.</div></td>\n",
       "      <td>0.040854</td>\n",
       "      <td>13873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>is gamma correction</td>\n",
       "      <td><div> this,  <font color='red'>is gamma correction</font>  (or the lack of it).</div></td>\n",
       "      <td>0.019406</td>\n",
       "      <td>13873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>if your viewer does not do gamma correction</td>\n",
       "      <td><div> <font color='red'>if your viewer does not do gamma correction</font> , then left hand ramp will have a long dark part and a short white part, and the point of equal brightness will be above the center.</div></td>\n",
       "      <td>0.013617</td>\n",
       "      <td>13873</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "answers = qa.ask('What causes computer images to be too dark?')\n",
    "qa.display_answers(answers[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like a lack of gamma correction can be one of the culprits."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deploying the QA System\n",
    "\n",
    "To deploy this system, the only state that needs to be persisted is the search index we initialized and populated in **STEP 1**.  Once a search index is initialized and populated, one can simply re-run from **STEP 2**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
